Efficient locking for multicore architectures

نویسندگان

  • Jean-Pierre Lozi
  • Gaël Thomas
  • Julia Lawall
  • Gilles Muller
چکیده

The scalability of multithreaded applications on current multicore systems is hampered by the performance of critical sections, due in particular to the costs of access contention and cache misses. In this paper, we propose a new locking technique, Remote Core Locking (RCL) that aims to improve the performance of critical sections in legacy applications on multicore architectures. The idea of RCL is to replace lock acquisitions by optimized remote procedure calls to a dedicated server core. RCL limits the performance collapse observed with regular locks when many threads try to acquire a lock concurrently and removes the need to transfer lock-protected shared data to the core acquiring the lock: such data can typically remain in the server core's cache. Our microbenchmark shows that under high contention, RCL is always more e cient than the other state-of-the-art lock mechanisms, and a preliminary macrobenchmark evaluation shows performance gains on SPLASH-2 benchmarks (speedup up to 4.85) and on the Web cache application memcached (speedup up to 2.62). Key-words: Locks, Multicore ∗ LIP6/INRIA † DIKU,INRIA/LIP6 ‡ INRIA/LIP6 ha l-0 06 41 25 2, v er si on 1 15 N ov 2 01 1 Verrouillage e cace pour les architectures multicores Résumé : L'extensibilité des applications parallèles sur les architectures multicoeurs modernes est limitée par la performance des sections critiques, pour des raisons de contention sur le bus et de défauts de cache. Dans cet article, nous proposons une nouvelle approche pour l'implémentation des verrous, appelée Verrou À Distance (VAD), qui permet d'améliorer la performance des applications patrimoniales sur les architectures multicoeurs. L'idée du VAD est de remplacer les acquisitions de verrous par des appels de procédures à distance vers un ou plusieurs coeurs dédiés. Le VAD permet de limiter l'e et d'e ondrement des performances observé avec les verrous classiques lorsque de nombreux ls d'exécution tentent d'acquérir simultanément un verrou. Le VAD évite également le transfert des données protégées par le verrou vers le coeur qui en fait l'acquisition. De fait, ces données restent dans le cache du coeur serveur. Sous haute contention, nos micro-évaluations montre que le VAD est toujours plus performant que l'état de l'art en matière de verrou. Sur des applications patrimoniales, nos expérimentations montrent un gain en performance pouvant aller jusqu'à 4.85 sur le banc d'essai SPLASH-2 et jusqu'à 2.62 sur le cache Web memcached. Mots-clés : Verrous, Multicoeurs ha l-0 06 41 25 2, v er si on 1 15 N ov 2 01 1 E cient locking for multicore architectures 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Design of a novel congestion-aware communication mechanism for wireless NoC architecture in multicore systems

Hybrid Wireless Network-on-Chip (WNoC) architecture is emerged as a scalable communication structure to mitigate the deficits of traditional NOC architecture for the future Multi-core systems. The hybrid WNoC architecture provides energy efficient, high data rate and flexible communications for NoC architectures. In these architectures, each wireless router is shared by a set of processing core...

متن کامل

Thread Cooperation in Multicore Architectures for Frequency Counting over Multiple Data Streams

Many real-world data stream analysis applications such as network monitoring, click stream analysis, and others require combining multiple streams of data arriving from multiple sources. This is referred to as multi-stream analysis. To deal with high stream arrival rates, it is desirable that such systems be capable of supporting very high processing throughput. The advent of multicore processo...

متن کامل

Efficient Cache Locking at Private First-Level Caches and Shared Last-Level Cache for Modern Multicore Systems

Most modern computing systems are having multicore processors with multilevel caches for high performance. Caches increase total power consumption and worsen execution time unpredictability. Studies show that way (or partial) cache locking may improve timing predictability and performance-to-power ratio for both single-core and multicore systems. Even though both private first-level and shared ...

متن کامل

SIP server performance on multicore systems

multicore systems C. P. Wright E. M. Nahum D. Wood J. M. Tracey E. C. Hu This paper evaluates the performance of a popular open-source Session Initiation Protocol (SIP) server on three different multicore architectures. We examine the baseline performance and introduce three analysis-driven optimizations that involve increasing the number of slots in hash tables, an in-memory database for user ...

متن کامل

Efficient Wavelet Tree Construction and Querying for Multicore Architectures

Wavelet trees have become very useful to handle large data sequences efficiently. By the same token, in the last decade, multicore architectures have become ubiquitous, and parallelism in general has become extremely important in order to gain performance. This paper introduces two practical multicore algorithms for wavelet tree construction that run in O(n) time using lg σ processors, where n ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011